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Abstract— Nowadays, with confronted huge volumes of data 
produced in the world that created this data in 
organizations, banks, military centers, hospitals, etc. 
Recommender systems are generated to deal with the 
problems of huge volumes of data. These systems help to 
users in many different fields among the massive volume of 
information to make the right decision. Recommender 
system by analyzing user behavior suggests to users 
appropriate services, like electronic stores. Also in today's 
world, the Internet provides vast amount of data to users. 
But if not available effective management on aggregate data, 
these data will be a barrier to progress. Nowadays, with the 
development of information systems need before than 
capable of directing users towards goods and services they 
are desire. But if effective management on aggregate data is 
not available, these data will be a barrier to progress. 
Therefore, in this article we offer a new proposed approach 
by using a hybrid method. To evaluate the proposed 
approach we used Movie Lens standard dataset. Also we 
used two techniques of collaborative filtering and content 
based filtering. For the proposed approach we offer four 
models. Finally, these models compared by accuracy of 
prediction and classification error. 

Keywords: Recommender System, Collaborative 
Filtering, Content-Based Filtering, Hybrid Filtering, Spiking 
Neural Network 



I. Introduction 

Recommender systems are nowadays an essential web 
application [1], which by filtering offers useful 
information. Many of electronic websites use 
recommender systems, such as online movie databases. 
Also recommender systems have been used in the field of 
social networks [2]. Recommender systems use a database 
about user preferences to predict additional topics or 
products that a new user may like [3]. There are varieties 
of filtering approaches for recommender systems. The 
most prominent approach, which is actually used by many 
real online bookstores, is to take the behavior, opinions, 
and tastes of a large community of other users into 
account. These systems are often referred to as 
community based or collaborative approaches [4]. 
Collaborative filtering systems typically require three 
steps: first, to obtain user information (user entries for the 
evaluation of certain information, etc.); Second, analysis 
of the similarity between users, the formation of the recent 
neighbors; finally, is the resulting recommendations [5]. 



Collaborative filtering algorithms are divided into two 
main types memory-based and model-based. Memory- 
based algorithms, based on the total set of items rated by 
the user are predicted [6]. Model-based algorithms to 
predict the rate of learning models such as neural 
networks or Bayesian models are used [6]. Model-based 
collaborative filtering has been best method in 
recommender system [7]. The model-based approach 
privileges for learning a predictive model uses too. But 
memory based method directly uses user-item ratings for 
predicting the new item points [8]. 

Content based filtering method uses the item 
information. At its core, content-based recommendation is 
based on the availability of item descriptions and a profile 
that assigns importance to these characteristics. If we 
think again of the bookstore example, the possible 
characteristics of books might include the genre, the 
specific topic, or the author [4]. 

Knowledge based approaches are distinguished in that 
they have functional knowledge: they have knowledge 
about how a particular item meets a particular user need, 
and can therefore reason about the relationship between a 
need and a possible recommendation. The user profile can 
be any knowledge structure that supports this inference 
[9]. Two well-known techniques for knowledge based 
recommendations include: Interacting with constraint- 
based recommenders and Interacting with case-based 
recommenders [10]. 

Fig. 1 sketches a recommendation system as a black 
box that transforms input data into a ranked list of items 
as output. User models and contextual information, 
community and product data, and knowledge models 
constitute the potential types of recommendation input. 
However, none of the basic approaches is able to fully 
exploit all of these. Consequently, building hybrid 
systems that combine the strengths of different algorithms 
and models to overcome some of the aforementioned 
shortcomings and problems has become the target of 
recent research [4]. Various combination methods include: 
weighted methods, switching, mixed, Feature 
combination, Cascade, Feature augmentation, Meta-level 
[9]. 

Related work is referred in Section 2; the used models 
are described in Section 3. Section 4 described proposed 
approach and the proposed approach steps. In Section 5, 
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dataset and preprocessing and experiment results are 
described. Section 6 discussed concludes of the article 
results. 



Input: 



User profile and 
contextual parameters 



Output: 




Product features 



A 



/Recommendation Recommendation 
component '' st 



Knowledge models 

Figure 1 . Recommender system as a black box [4] 

II. RELATED WORK 

Recommender systems have become an important 
research area in the mid-1990s [11]. Examples of 
researches in recommender systems include: [21], Group 
Lens [15], recommending books and other products at 
Amazon.com [12], MoviExplain [18], E-MRS [19], Fab 
[35], recommending news at VERSIFI Technologies [14], 
ORBIT [26]. 

Also Examples of hybrid recommender systems 
include: hybrid recommender system based on Multi- 
Layer Perceptron Neural Network with Collaborative 
filtering and Content-based filtering [16], [23], hybrid 
recommender system based on Naive Bayes and item- 
based Collaborative filtering [22], hybrid recommender 
system by combining predictions using Collaborative 
filtering based on neighborhood and Demographic 
filtering and Content-based filtering [25]. In 2006 Hybrid 
Approach based on CF and Neural Network using Movie 
lens dataset was proposed [17]. In 2007 Web based movie 
recommender system was presented, which of the three 
techniques Demographic filtering, Content-based filtering, 
and Collaborative filtering has been used [20]. And also in 
2011 recommender algorithm for mobile was presented 
[24]. 

III. BACKGROUND 

A. Neural Networks 

In this article using of Multi-Layer perceptron neural 
network and spiking neural network for constructing 
model. Multi-Layer Perceptron network has been used in 
various domains until today, which do not explain. But 
Spiking Neural Network, as the third generation of neural 
networks has been considered [27]. According to research 
in Spiking Neural Networks, we can say that these 
networks as computationally are more powerful than other 
conventional neural networks [28]. And in solving 
problems require fewer neurons than other networks. The 
Error back propagation algorithms have been developed 
for Spiking Neural Networks [29]. A spiking neuron by 



sending an electric pulse to fire at a given time. These 
pulses are called action potentials or Spike. Spike is an 
amount of time that it is defined by the function [30]. 

Spiking Neural Network is shown in Fig. 2. The input 
layer of this network, defined by a collection of neurons 
called H. The input layer release Input spikes that reached 
in different time into middle layer. In the middle layer, a 
collection of neurons called I is defined. 
Middle layer Spikes, released to the neurons in the output 
layer that is called J [31]. Neurons of each layer, with next 
layer are fully connected. Between neurons and other 
neurons in the next layer there are m connections. Each 
connection has a delay and a specific weight. This model 
is inspired by real neural tissue [32]. 
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Figure 2. Spiking Neural Network and mode connections between 
neurons in this network [33] 

B. Naive Bayes 

Naive Bayes classifier using a probabilistic framework 
for solving classification problems is using. In the Naive 
Bayes we have the following conditional probabilities: 



P(C|A) 



P(A|C) 



P(A,C) 
P(A) 

P(A,C) 
P(C) 



(1) 
(2) 



According to Naive Bayes the following equation holds 
true: 

P(AIC)P(C) 



P(C|A) : 

assumed record 



(3) 



P(A) 

An assumed record with collection features 
(A 1 A 2 A 3 ....A n ) Consider. Our goal is to calculate the 
batch this record. In fact of existing category we find that 
the probability of a set p(c I A 1 A 2 A 3 ... . A n ) to maximize. 
First with previous formula calculated probabilities for all 
existing category. And then set the maximum value that 
can be considered as a new record category. According to 
the formula below the denominator is the same for all 
categories, so the goal is find a set to maximize 
relationships face. 



P(C|A 1 A 2 A 3 ....A n ) 



p(A 1 A 2 A 3 ....A n IC)P(C) 
p(A 1 A 2 A 3 ....A n ) 



(4) 
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C. Decision tree 

Classification models based on decision tree the output 
knowledge of a decision tree, the attribute values are 
presented in different scenarios. Various methods for 
selecting features and the stop condition for the 
construction of decision trees that are actually numerous 
kinds of ways to build a decision tree based on the same 
theme. 

IV. THE PROPOSED HYBRID APPROACH 

In this section, a new hybrid movie recommender 
system is presented, which increases the accuracy of the 
prediction will be achieved. In fact we created by combine 
multiple recommendations techniques to improve a movie 
recommender system. To the extent two techniques of 
collaborative filtering and content-based filtering can be 
combined using integrated hybrid scheme. Integrated 
hybrids a component unit of the proposer, the integrating 
different approaches formed by preprocessing and 
combines several sources of knowledge. Different types 
of data that will be applied to Recommender Systems are 
content based information and collaborative information. 
Content based information, described real data. In the area 
of movie, Content based data consist on styles (comedy, 
action,), actors name, release date, and more. 
Collaborative information including user comments in the 
data. For example, in the same area will be users rated of 
movies. 

The recommender systems use the only collaborative 
information to find correlations between users and users 
target to suffer from problems. Therefore combination of 
methods is used. By using Integrated scheme combines 
collaborative filtering techniques and content based 
filtering. Proposed approach consists of two steps, 
clustering of samples, and the construct the model that is 
shown in Fig. 3. First, we clustered samples selected to be 
similar records in the same cluster. We used of 
classification models to build the model. 



each cluster considered as well as the cluster centers. 
Formula 1 is used to detect the degree of similarity. 

d(*,y) = Z?=i \xi-yi\ ' (5) 

In the formula, n is the number of features. Also X; and y± 
thus represents the I th features of x and y is two records. 

B. Making models of classification models 

At this stage, the clusters were created in the previous 
step, considered as the label classifier models. We used 
models classified to build the model. At this stage, the 
samples can be divided into two categories training set 
and test set. The number of samples in the training phase 
and the test will choose it correct to get better results. In 
general, different methods of data mining are classified to 
predict and descriptive. Prediction methods to predict of 
values of some attributes are used to specify the value of a 
property. Our goal at building model presented a model 
with high accuracy is proposed. In the proposed approach, 
we used prediction the nature of procedures and the 
monitoring and classification models. Our models are four 
classifiers models that include: Spiking Neural Network 
(SNN), Multi-Layer Perceptron neural network (MLP), 
Decision tree, Naive Bayes. Procedure of the proposed 
approach is shown in Fig. 4. As shown in Fig. 4, the 
approach process includes extracting dataset, create a 
central data repository, preprocessing, clustering, data 
construct the model by using classification models, 
evaluation and interpretation of the model is that at this 
stage we examine the prediction accuracy. 




± 



First step: clustering of samples 



it 



Movie Rating 



Fi^uie 3. yiuyuyeii liyunu approach 

A. Clustering of samples 

At this stage of the clustering algorithm we use k- 
Means. The main idea of in this algorithm is defined k for 
each cluster center. The goal of clustering is to assign 
each data sample to a cluster so that has the minimum 
distance to the center. Also average points belonging to 




Figure 4. Stages of the proposed approach 
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V. EXPERIMENTAL RESULT AND EVALUATION 

A. Movie Lens dataset 

The Movie lens dataset has been established by Lens 
research group from the University of Minnesota. This 
dataset is comprised of three types of datasets. Dataset 
used consisting of 100,000 ratings for 1682 movies and 
943 users, which contains three sets of user data, video 
data, and ranking data. 

B. preprocessing 

In this article we used preprocessing operation before 
build models. Among is the most important coding, 
Feature subset selection, Data transformation, Sampling, 
Evaluate the correlation between features, Time coding 
based on Spiking patterns. After creating central data 
repository using three sets of user data and video data and 
rating data, we have a data matrix with 100,000 records 
that we applied the preprocessing steps on these data 
matrix. In Feature subset selection, we select important 
features. In Fig. 5 is shown the frequency of movie styles, 
which shows more movie style drama, comedy, action 



will be. That is why we selected 17 major styles of 19 
style movies. At the user Collaborative information, we 
select features such as user id, age, gender, occupation, 
and of information about movies, 17 type styles, and 
movie release Year, movie id, movies rating, timestamp. 
Then draw correlation matrix of features. Characteristics 
of Movie lens dataset are not correlated with each other. 
In result we used of all features that we selected until this 
stage in model construction process. Conversion function 
applied for the attribute of user's occupation and attribute 
of user's age. Also been applied to the Spiking Neural 
Network input data, and output is converted into Spiking 
patterns. 

After data preparation, data are clustered. We must 
select the appropriate number of clusters, in order that the 
data in different clusters of 10 teeth, 15 teeth, and 25 to 
100 entries can be clustered. Then create the neural 
network model, by review and compare the accuracy of 
neural network models with different number of clusters, 
we divided data into 10 clusters. To build the model select 
5,000 Record by random sampling. 
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Figure 5. Variety of styles movie 



C. Results of models 

After preprocessing stage, we evaluate four models. To 
evaluate models for the training set 70% and for the tests 
the remaining 30% we consider. In Fig. 6 compares the 
results of the models is given according to classification 
error. By attention to fig. 6 can be expressed Multi-Layer 
Perceptron neural network model and a naive Bayes have 
smaller classification error than the other models. Fig. 7 
shows compares models with classification error and 
prediction accuracy. Perceptron Neural Network Model 
and Naive Bayes have higher prediction accuracy. Naive 
Bayes model with prediction accuracy 99.8% and 
classification error 0.2% is the most appropriate model for 
the Movie lens data. 



VI. CONCLUTION AND FUTURE WORK 

In this article, new approach using the techniques of 
collaborative filtering and content based filtering in the 
movie hybrid recommender system is presented. The 
dataset used is Movie lens. Four models Spiking Neural 
Network (SNN), Multi-Layer Perceptron Neural Network 
(MLP), Decision tree, Naive Bayes was presented. The 
four models evaluated with the prediction accuracy and 
classification error. Evaluating and comparing results of 
the four models, show improving in recommender 
systems. 

Spiking Neural Network was used for the first time in 
the field of recommender systems. The results show that 
this network is predicted in movie recommender system 
with 0.52 % classification error. Results show that the 
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hybrid approach by attention to improve the weaknesses 
of single filtering methods is suitable for recommender 
systems. The results presented in this article using Multi- 
Layer Perceptron neural network prediction accuracy with 
95.87 %, and Decision tree model with 92.61 %, and 
Naive Bayes classifier with 99.83% Predictions to do. For 
improving recommender systems can be used hybrid 
different approaches, by combining approaches of 
collaborative filtering, and content based filtering, and 



demographic filtering, and knowledge based filtering. 
Also hybrid integration techniques, and parallel, and lines 
can be 

Used. According to the new and simulations of brain with 
more realistic of spiking neural networks, this network 
can be used in various fields. 



Results of the models based on the classification error measure 



l classification error 



0.52 




Spiking Neural 
Network(SNN) 



multilayer 
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network(MLP) 



naive bayes 



Decision tree 



Figure 6. Results of the models based on the classification error measure 
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Figure 7. Results of the models with classification error and prediction accuracy 
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Abstract — Virtualization is a very efficient mechanism 
that has improved machines use. This technique has 
been implemented in many systems and it helps utilize a 
single physical machine to run different operating 
systems simultaneously. Another benefit of 
virtualization is Live Migration. Such feature allows us 
to move virtual machines to different hosts easily. In 
this article I will install different operating systems in 
my laptop and try to migrate them to another computer 
to share data in a short amount of time. 

Keywords: virtualization; teleportation; virtual machine; 
hypervisor; live migration; 

1. INTRODUCTION 

The necessity of virtualization is very early since 
60' s. Nowadays the term virtualization has gone into 
a new level and it is used a lot. 

Virtualization definition in computing refers to the 
act of creating a virtual (rather than actual) version of 
something, including but not limited to a virtual 
computer hardware platform, operating systems, 
storage devices or network resources [1] . 

Without the virtualization a physical host cannot run 
multiple operating systems simultaneously. To create 
a virtual machine is used a software called 
hypervisor. Hypervisor acts like some kind of 
manager for virtual machines. 

Due to importance of virtualization nowadays 
hypervisors are very advanced. Even today's 



operating systems like windows 8 have build in 
hypervisors Hyper V. Other well known hypervisors 
are VMware ESX and ESXi, Citrix XEN and Oracle 
VM. Each one of these hypervisors offer a feature 
called teleportation or Live Migration. Through Live 
Migration we can move a virtual machine from a 
physical host to another with not much effort. 

2. VIRTUALIZATION BENEFITS 
Virtual machines have many benefits [2] 

• Reduce the number of machines: by 
converting a single physical machine that 
runs with multiple systems inside, there are 
not needed many computers to do this. 

• Increase the efficiency: the computers will 
be more efficient by using many operating 
systems inside that support different 
software OOcomponents if needed. 

• Faster development: virtualization made 
possible an incredibly fast development of 
test environments. Even in error cases not 
everything is lost so the virtual test 
environment is the best one. 

• Simple management: in a single machine 
you can manage many systems and all their 
components. VMware lets you administer 
both virtual and physical machine 
simultaneously . 

• Reduce the costs: by virtualizing you do not 
need to buy many servers. In a single server 
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you can virtualize many systems so the heat 
produced by the servers is reduced, and all 
the process costs less. 



3. RELATED WORKS 

Many articles and experiments have been made in 
virtualization and migration field and many authors 
have studied different features of these [4] In these 
articles they have estimated the performance, 
installation, the advantages and different options. A 
lot of authors have studied the performance of the 
virtual machines by measuring the time of the 
response for an operation that has been asked to do. 
Other articles have in live migration filed except 
creating a migration environment and analyzing the 
benefits of migration there are estimated also the 
downtime and the total migration time [3] 
With all these articles and research made in this field 
I still haven't made clear the reason: 

Why is live migration used? 

In most of big companies live migration is used to 
manage servers in a simple way. By transporting 
virtual machines such easily server maintenance 
doesn't cost too much in time and availability point 
of view. 

Another big benefit of live migration is managing 
load balance in servers. When a server is overloaded 
a quick load migration into another server does a 
pretty good balance. 

But in home conditions like my case only benefit of 
performing it is sharing data simply especially when 
u have different operating systems that cannot 
perform simple sharing data by using a home 
network. 



4. ENVIRONMENT OF THE EXPERIMENT 

The experiment is realized in two computers running 
Windows 7 with the same CPU family: dual core and 
core 2 duo, and that are in the same network. The 
computers RAM are respectively 6 GB and 4GB. 



5. THEORY OF THE EXPERIMENT 

In this paper I will try to perform a live migration of 
a virtual machine using as hypervisor Virtual Box 
and see how much time it will require and see CPU 
consumption and memory used. Virtual Box from 
Oracle is a desktop hypervisor and it does not require 
a server to install virtual machines and perform live 
migration. But live migration in Virtual Box requires 
a shared storage between source computer and host 
computer. Also it is required that these two computer 
need to be in same network. 

Since I do not have a shared storage between my two 
computers I will be creating it virtualy. To create this 
I will need a software called FreeNas to make ISCSI 
disks. FreeNas is a simple BSD OS that will help me 
create a shared storage between my 2 computers to 
perform the teleportation. 

After installing FreeNas in a VM I configure its LAN 
and give it an ipv4 address 192.168.1.250:80 so it can 
communicate with host computer or other virtual 
machines that will need ISCSI disks. I make sure this 
communication is not blocked by any firewall by 
pinging it frorm FreeNas shell. 

Than we access the web GUI of FreeNas from a 
simple web browser from host computer or any other 
VM that holds a OS and configure the disks of 
FreeNas VM. I will create one ISCSI disk at size of 
10 GB in order to install guest OS in other VM. 

After the disks are created we need to add them to 
VirtualBox hard drives. I do this by executing a script 
in CMD that will attach ISCSI disks to VirtualBox 
HDD. 

VBoxManage attachiscsidisk -server 
192.168.1.250 -iqn.2007- 
O9.jp.ne.peach.istgt:auroradisk 
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order to attach the shared disk to virtualbox in 
destionation computer I perform the 

VBoxManage attachiscsidisk -server 
192.168.1.250 - iqn.2007- 
O9.jp.ne.peach.istgt:auroradisk) 

that we installed Ubuntu guest OS in source 
computer. But in destination computer we will not 
install anything. Now the virtual machine is ready to 
be teleported. 



^Microsoft Windows [Uersion 6.1.7600] 

■Copyright <c) 2009 Microsoft Corporation. Oil rights reserued. 
C:\Users\fldmin>cd. . 
|C:\Users>cd.. 



■C:\>cd program files 
C:\Program Files >cd sun 
C:\Program Files\Sun>cd virtualbox 

C:\Progran Files\Sun\UirtualBox>vboxmanage add is c s id is k —server 192.168.1.250 - 

-target iqn .2007-09 . jp.ne .peach. istgt :auroradisk 

Sun UirtualBox Command Line Management Interface Uersion 3.1.6 

(C) 2005-2010 Sun Microsystems, Inc. 

fill rights reserved. 

iSCSI disk created. UUID: a29575f2-5d3c-4118-996c-645ba9068758 
C : \Program FilesNSun\U irtualBox) 



Figure 1. Script execution 

After this command we see in: 

File -> Virtual Media Manager ->Hard Disks 

that a new disk is added not more .vdi but .iscsi and 
not attached to any virtual machine created. Now I 
will create a new virtual machine and not create a 
new hdd but use the disk .iscsi created before. I will 
install Ubuntu 10.04 as guest OS. These things will 
be performed in source computer. 



6. EXPERIMENT 

Now that I have prepared the environment for the 
expertiment I will perform the teleportation and 
measure the time that it requires the CPU and 
memory consumption. First I prepare the target 
machine created before to wait for a incoming VM. I 
do this by performing this command in cmd. 
VBoxManage modifyvm ubuntu2_aurora — 
teleporter on -teleporterport -1234 (we can use a 
port whatever we want). 



Virtual Media Mar 



mager 



g Hard Disks ! CD/DVD Images H Floppy Images 



Name 

FreeNAS.vdi 
aurora.vdi 



Virtual Size Actual Size 

2.00 GB 72.01MB 

10.05 GB 41.00 KB 

9.77 GB 9.77 GB 



Location: 1 92 . 1 68 . 1 . 250 1 iqn . 2007-09 . jp . ne . peach . istgt : auroradisk 

Type (Format): Normal (iSCSI) 
Attached to: Mot Attached 




Figure 3. Starting the teleportation 



Figure 2. New disk added 



In destination computer we create another virtual 
machine with the same settings as the virtual machine 
in souce computer and select as hdd shared disk ( in 
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Figure 4: Virtual machine teleported 

On source computer I turn on the virtual machine 
ubuntul_aurora and perform this command. 

VBoxManage controlvm ubuntul teleport -host 
192.168.1.3 -port 1234 

Now a bar is shown in cmd and telling us in decimal 
digits how percent the teleportation is completed. In 
this moment we measure the cpu and memory usage 
of this process in source computer on Task Manager. 
I will measure the time needed to perform this 
teleportation. 

On my expectation it should not take to much time 
because we are using a shared storage and VM's 
data are in shared storage and only the configuration 
files are teleported. I will perform this test a few 
times to get a correct result. 

7. RESULTS 

The first time I tried the experiment I was using a 
Pentium 4 instead of Dual Core and cpu and ram 
went 100% on source computer leading to a deadlock 
because we had a "CPU time mismatch error". 
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Figure 5: CPU used during the experiment (deadlock 
CPU missmatch) 

Than i tried it in another computer and it worked 
correctly. 

The machine is teleported and in the destination 
computer it is in the same state as it was before. In 
the second teleportation I created a file .txt in source 
computer and type some words. 

When teleport is completed we see in destination 
computer that the file is opened as well with words in 
it. It took approximately 7 s to complete the 
teleportation. I performed it 2 other times to compare 
the result. 



Table 1 : Cpu consumption and Ram used 



Teleportation 


1 


2 


3 


Time (s) 


7.1 


8.1 


7.7 


Cpu (%) 


2.1 


2.2 


2.1 


Memory private 
(kB) 


41.241 


42.133 


40.175 



8. CONCLUSIONS 

Live migration is a very useful and effective 
technique which has improved a lot server 
management and maintenance. Even in this simple 
experiment I have demonstrated: 



10 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 

Vol. 12, No. 10, October 2014 



• An easy way to migrate virtual machines 
using VirtualBox hypervisor 

• The teloprtation of virtual machines is 
performed in the minimal amount of time 

• The machine teleported in destination is the 
same as it was in source computer 

• This simple technique of migration can be 
executed several times and we also can 
measure and evaluate every time CPU and 
RAM used during the teleportation 



[9] http://www.sysprobs.com/setup-test-virtualbox- 
teleportation-normal-pc-live-migration- virtual- 
machines 

T 1 Ol https ://blogs. oracle.com/vrealitv/entry/teleporting 



9. FUTURE WORKS 

Virtualization and live migration are very efficient 
technologies commonly used today and in the future 
they will be implemented even in in many and many 
systems. Other software components may be 
developed and installed to perform easily and faster 
live migration between two or more machines. 
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Abstract 

In wireless, satellite, and space communication 
systems reducing error is critical. High bit 
error rates of the wireless communication 
system require employing various coding 
methods on the data transferred. To address the 
large latency and degraded network throughput 
due to the retransmission triggered by frame 
loss in high speed wireless networks, the 
purpose of this paper is to study and 
investigate the performance of fountain codes 
that is used to encode and decode the data 
stream in digital communication. This solution 
intelligently encodes a number of redundant 
frames from original frames upon link loss rate 
so that a receiver can effectively recover lost 
original frames without significant 
retransmissions. Since then, many digital 
Fountain coding methods have been invented 
such as Tornado codes, Luby transform (LT) 
codes and Raptor codes. 

Keywords: wireless communication systems, 
fountain codes, Tornado codes, Luby 
transform, Raptor codes 

1. Introduction 

Wireless networking technologies have been 

widely deployed in civil and military applications 
such as 3G/4G and IEEE 802.11 WLAN 
networks. However, wireless communication 
suffers from frame losses due to channel fading, 
shadowing, mobility and transmission collisions 



(interferences). Frame loss significantly 
undermines wireless network performance in that 
1) latency is enlarged and 2) throughput is 
degraded. The large latency is incurred by the 
retransmission of lost frames in the MAC layer 
that is part of most MAC protocols for reliable 
link layer point-to-point transmission [6]. 
On the Internet, data is transmitted in the form of 
packets. Each packet is equipped with a header 
that describes the source and the destination of the 
packet, and often also a sequence number 
describing the absolute or relative position of the 
packet within a given stream. These packets are 
routed on the network 

from the sender to the receiver[16]. Due to 
various reasons, for example buffer overflows at 
the intermediate routers, some packets may get 
lost and never reach their destination. Other 
packets may be declared as lost if the internal 
checksum of the packet does not match. 
Therefore, the Internet is a very good real-world 
model of the BEC. 

Reliable transmission of data over the Internet has 
been the subject of much research. For the most 
part, reliability is guaranteed by use of appropriate 
protocols. For example, the ubiquitous TCP/IP 
ensures reliability by essentially retransmitting 
packets within a transmission window whose 
reception has not been acknowledged by the 
receiver (or packets for which the receiver has 
explicitly sent a negative acknowledgment). It is 
well known that such protocols exhibit poor 
behavior in many cases, such as transmission of 
data from one server to multiple receivers, or 
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transmission of data over heavily impaired 
channels, such as poor wireless or satellite links. 
Moreover, ack-based protocols such as TCP 
perform poorly when the distance between the 
sender and the receiver is long, since large 
distances lead to idle times during which the 
sender waits for an acknowledgment and cannot 
send data. For these reasons, other transmission 
solutions have been proposed. One class of such 
solutions is based on coding. The original data is 
encoded using some linear erasure correcting 
code. If during the transmission some part of the 
data is lost, then it is possible to recover the lost 
data using erasure correcting algorithms. For 
applications it is crucial that the codes used are 
capable of correcting as many erasures as 
possible, and it is also crucial that the encoding 
and decoding algorithms for these codes are very 
fast. 

Reed-Solomon codes can be used to partially 
compensate for the inefficiency of random codes. 
Reed-Solomon codes can be decoded from a 
block with the maximum possible number of 
erasures in time quadratic in the dimension. 
(There are faster algorithms based on fast 
polynomial arithmetic, but these algorithms are 
often too complicated in practice.) However, 
quadratic running times are still too large for 
many applications. 

In [16, 18], the authors construct codes with linear 
time encoding and decoding algorithms that can 
come arbitrarily close to the capacity of the BEC. 
These codes, called Tornado codes, are very 
similar to Gallager's low-density parity-check 
(LDPC) codes [19], but they use a highly irregular 
weight distribution for the underlying graphs. 
Fountain codes are ideally suited for transmitting 
information over computer networks. A server 
sending data to many recipients can implement a 
Fountain code for a given piece of data to 
generate a potentially infinite stream of packets. 
As soon as a receiver requests data, the packets 
are copied and forwarded to the recipient. In a 
broadcast transmission model there is no need for 



copying the data since any outgoing packet is 
receive by all the receivers. In other types of 
networks, the copying can be done actively by the 
sender, or it can be done by the network, for 
example if multicast is enabled. The recipient 
collects the output symbols, and leaves the 
transmission as soon as it has received of them. At 
that time it uses the decoding algorithm to recover 
the original symbols. Note that the number is the 
same regardless of the channel characteristics 
between the ender and the receiver. More loss of 
symbols just translates to longer waiting time to 
receive the packets. If can be chosen to be 
arbitrarily close to , then the corresponding 
Fountain code has a universality property in the 
sense that it operates close to capacity for any 
erasure channel with erasure probability less than 
1. 

This paper reviews the channel coding methods in 
the physical layer and Digital Fountain coding 
proposals in the application layer. The paper is 
organized as follows. Section 2 discusses the RS 
codes architectures, applications and limitations. 
Section 3 presents the fountain codes properties, 
the related erasure channel and construction. 
Section 4 Recent advances have produced 
powerful fountain codes, such as Luby Transform 
(LT) codes , Tornado codes and Raptor codes. 
Finally, the conclusion is presented in section 5. 

2. Developing Background 
Related Workand Motivations 

2.1. Error correction control (ECC) 

Coding techniques are used in communication 
system to improve the reliability and efficiency of 
the communication channel. The reliability is 
commonly expressed in statical terms such as the 
probability of receiving the wrong information, 
that is, information that differs from what was 
originally transmitted. Error control is concerned 
with techniques of delivering information from a 
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source (sender) to a destination (the receiver) with 
a minimum of errors. In a digital audio recorder 
system, the sound signal is digitized in the form of 
(binary) symbols. In order to make it possible to 
reliably record the digital data, the data are, prior 
to recording, translated in two successive steps (a) 
error correcting code and (b) recording code. The 
output generated by the recording code is stored 
on the storage medium in the form of binary 
physical quantities. 

Error correction control is realized by adding 
extra symbols to the conveyed message. These 
extra symbols make it possible for the receiver to 
detect and /or correct some of the errors that may 
occur in the retrieved message. The main 
challenge is to achieve the required protection 
against the inevitable transmission errors without 
paying too high a price in adding extra symbols. 
There are many different families of error- 
correcting codes of major importance for 
recording applications is the family of Reed- 
Solomon (RS) codes. 

2.2. Binary Erasure Channel 

The Binary Erasure Channel (BEC) is a channel 
model where the receiver either receives the 
transmitted bit or is informed with the erasure of 
the bit, that is, the bit was not received or erased. 
Therefore, the receiver has no idea about the 
transmitted bit with a certain probability p, and is 
exactly sure about the transmitted bit with a 
certain probability 1 -p. According to Shannon, the 
capacity of BEC is l-p, which means that for the 
nk 

alphabet size of L , where k is the number of 
bits in the alphabet, no more than (1- p) k 
bits/symbol can be reliably communicated over 
the binary erasure channel[20]. 
Additionally, any feedback from the receiver to 
the transmitter will not increase the capacity of 
the channel and reliable communication should be 
possible at this rate. Automatic Repeat Request 
(ARQ) schemes have so long been used as a 
classical approach to solve the reliable 



communication problem. However, excessive 
number of feedbacks used in the case of erasures 
causes wasteful usage of bandwidth, network 
overloads and intolerable delays. Also known 
as rateless erasure codes are a class of erasure 
codes with the property that a potentially 
limitless sequence of encoding symbols can be 
generated from a given set of source symbols such 
that the original source symbols can ideally be 
recovered from any subset of the encoding 
symbols of size equal to or only slightly larger 
than the number of source symbols. The 
term fountain or rateless refers to the fact that 
these codes do not exhibit a fixed code rate. 

2.3. Reed-Solomon code 

A Reed-Solomon (RS) code is an error-correcting 
code first described in a paper by Reed and 
Solomon I 1960 [1].RS encoding data is relatively 
straightforward, but decoding is time-consuming, 
despite major efficiency improvements made by 
Berlekamp and other during the 1960 ? s. Only in 
the past few years has it become computationally 
possible to send high-bandwidth data using RS. 
RS codes are non-binary cyclic error-correcting 
codes. The RS encoder takes a block of digital 
data and adds extra bits. While the errors occur 
during transmission or storage, the RS decoder 
processes each block and attempts to correct 
errors and recover the original data. The number 
and type of errors that can be corrected depends 
on the characteristics of the RS code. 

2.3.1. Encoding Of RS Codes 

The basic structure of RS code as shown in Figure 
1 represented that the codeword symbols (n) is 
unite of two segments information symbols (k) 
and parity symbols (2t). The information symbols 
(k) is having message that is to be transmitted and 
parity symbols (2i) is the redundancy added to 
message to transmit it from source to destination 
without error [7] i.e., noise. 



14 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol. 12, No. 10, October 2014 



code word in svmbols) 



symbol 
(m bits) 



l til 



original message (k symbols) 



ilLU 



/ 

parity 
(n-Mt symbols) 



Figure (1): Encoding of RS codes 

Reed-Solomon codes are nonbinary cycliccodes 
with symbols made up of m-bit sequences, where 
m is any positive integer having a value greater 
than 2. R-S (n, k) codes on m-bit symbols exist for 
all n and k for which 

0<k<n<2 m + 2(1) 

Where k is the number of data symbols being 
encoded, and n is the total number of code 
symbols in the encoded block. For the most 
conventional R-S (n, k) code, 

(n, k)= ( 2 m -l, 2 m - 1 - 2t)(2) 

Where t is the symbol-error correcting 
capability of the code, and n - k = It is the 
number of parity symbols. An extended R-S code 

can be made up with n = 2 m or n = 2 m + 1 . 
Reed-Solomon codes achieve the largest possible 
code minimum distance for any linear code with 
the same encoder input and output block lengths. 
For nonbinary codes, the distance between two 
code words is defined (analogous to Hamming 
distance) as the number of symbols in which the 
sequences differ. For Reed- Solomon codes, the 
code minimum distance is given by 

d>min =n - kJrl (3) 



The code is capable of correcting any combination 
of t or fewer errors, where t can be expressed as 
[3]: 

t = 



i-l 



n-k 



(4) 



RS differs from a Hamming code in that it 
encodes groups of bits instead of one bit at a time. 
We will call these groups \digits" (also \symbols" 
or \coefficients"). A digit is error-free if and only 
if all of its bits are error-free. 
Classical coding scheme for recovering erasures 
are Reed-Solomon codes [1, 3] employed in a 
variety of commercial applications, most notably 
in data storage as a key component of compact 
disks. In coding theory, Reed-Solomon codes are 
an example of Maximum Distance Separable 
(MDS) codes which achieve the Singleton bound 
[4]. Maximum distance separable (MDS) codes 
are practical codes that achieve the capacity of the 
erasure channel. A (n, k, d) MDS code, has a 
property that any k coordinates constitute an 
information set [2]. A receiver that receives any k 
symbols from a total of n symbols in each 
codeword can reconstruct the original message, 
provided it knows the position of the k received 
symbols. Reed Solomon (RS) codes are the most 
well-known MDS codes. These can be decoded in 
time O (K2), using algebraic methods such as list 
decoding. 

2.3.2. Decoding Of RS Codes 

The RS decoder consists of two main stages; error 
detection stage, and error correction stage as 
shown in figure (2) [15]. Firstly, a serial 
syndrome is used to checks if this codeword is a 
valid codeword or not. If errors occurred during 
transmission, the decoder carried out error 
detection, then tray to correct these errors. 
Secondly, the key equation solver is used as 
decoding algorithm to find the coefficients of 
error-location polynomial o(x) and error-evaluator 
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polynomial W(x). Thirdly, the Chien search block 
which is used to find the roots of a(x) which 
present the inverse of the error locations. 
Fourthly, the Forney algorithm block is used to 
find the values of the errors. Finally, after getting 
the values and locations of the error, the received 
codeword can be corrected by XOR-ing the 
received vector with the error vector. 
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Figure (2): RS decoder 

2.3.3. Reed-Solomon Codes applications 

In particular, Reed-Solomon codes are the most 
frequently used digital error control codes in the 
world, due their usage in computer memory and 
non- volatile memory applications. A hurried list 
of significant applications includes the Digital 
Audio Disk, Deep Space Telecommunication 
Systems, Error Control for Systems with 
Feedback, Spread- Spectrum Systems, and 
Computer Memory [8]. 

2.3.4. Reed-Solomon Codes limitations 

Despite their popularity, RS codes are not suitable 
for bulk data distribution over the internet. The 
quadratic decoding time is unacceptable when 
data rates are of the order of Mbps [2]. 
Furthermore, typical RS code implementations 
have small block lengths such as the NASA 
standard (255; 233; 33) code over F256. This 
requires a large file to be segmented into many 
small blocks before transmission. Finally, since 
RS codes are block codes, they need to be 



designed for a specific rate. This requires that we 
need to estimate the erasure probability of the 
channel beforehand. This is clearly not possible 
when multiple clients over different quality of 
channels are being served simultaneously. 

3. Fountain codes 

A fountain code is a forward-error-controlcode 
that can produce as many redundant packets as 
needed for packet erasure correction. Unlike 
automatic-repeat-request (ARQ) transmission, 
fountain coding does not require the destination to 
inform the source of the identities of the packets 
that are erased or even keep track of which 
packets are erased. We examine the use of 
fountain coding for both unicast and multicast 
transmission in packet radio systems, where 
communication takes place over time-varying 
channels with fading, shadowing, and other types 
of propagation losses. 

In [16], Shokrollahi states, A decoding algorithm 
for a Fountain code is an algorithm which can 
recover the original k input symbols from any set 
of n output symbols with high probability. For 
good Fountain codes the value of n is very close 
to k, Note that the number n is the same regardless 
of the channel characteristics between the sender 
and the receiver. More loss of symbols just 
translates to a longer waiting time to receive the n 
packets." Thus, for noisy wireless channels, the 
waiting time, and consequently the overall 
throughput performance of the fountain coding 
system, depends heavily on the selection of the 
channel code and the modulation format usedto 
transmit the wireless signals. 
Proposed for wireless mesh networks by Katti, et 
al. not only forwards the packets but also mixes 
packets from different sources into a single 
transmission and decomposes the packets at the 
receiver. In upper layers, a coding concept called 
Digital Fountain has been introduced in 1998 by 
Byers, et al to generate a stream of packets, 
including some redundant packets, like in water 
fountain to address potential packet loss in 
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multicast applications that do not allow 
retransmission. Since then, many Digital Fountain 
coding methods have been invented such as Luby 
transform codes (LT codes) and Raptor codes. 
Consider a setting where a large file is 
disseminated to a wide audience who may want to 
access it at various times and have transmission 
links of different quality. Current networks use 
unicast-based protocols such as the transport 
control protocol (TCP), which requires a 
transmitter to continually send the same packet 
until acknowledged by the receiver. It can easily 
be seen that this architecture does not scale well 
when many users access a server concurrently and 
is extremely inefficient when the information 
transmitted is always the same. In effect, TCP and 
other unicast protocols place strong importance on 
the ordering of packets to simplify coding at the 
expense of increased traffic. 

3.1. Digital fountain codes 

The digital fountain was devised as the ideal 
protocol for transmission of a single file to many 
users who may have different access times and 
channel fidelity. The name is drawn from an 
analogy to water fountains, where many can fill 
their cups with water at any time. The output 
Packets of digital fountains must be universal like 
drops of water and hence be useful independent of 
time or the state of a user's channel [12, 13]. 
As show in Figure (3) the encoder of fountain 
codes is like a fountain spewing. Infinite coded 
symbols can be produced. Source data is divided 
into k input symbols of size /. With fountain 
codes, the k input symbols are combined into 
infinite encoding symbols at source. All k input 
symbols can be recovered from any set of (1+ 8) k 
encoding symbols, where 0< e<l. Encoder of 
fountain codes is bit rate independent which is not 
limited by the size of the source data and can 
generate an unlimited number of encoding 
symbols. 




n 
O 
Q_ 




D 

a> 
n 
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Q_ 



Figure (3): Fountain code 
3.2.Fountain codes properties 

Consider a file that can be split into k packets or 
information symbols and must be encoded for a 
BEC. A digital fountain that transmits this file 
should have the following properties: 

1. It can generate an endless supply of encoding 
packets with constant encoding cost per packet in 
terms of time or arithmetic operations. 

2. A user can reconstruct the file using any k 
packets with constant decoding cost per packet, 
meaning the decoding is linear in k. 

3. The space needed to store any data during 
encoding and decoding is linear in k. 

These properties show digital fountains are as 
reliable and efficient as TCP systems, but also 
universal and tolerant, properties desired in 
networks. 

3.4. Fountain Code Construction Outline 

Fountain Codes are a new class of codes designed 
and ideally suited for reliable transmission of data 
over an erasure channel with unknown erasure 
probability. The encoder can produce potentially 
infinite number of output symbols. Output 
symbols can be bits or more general bit 
sequences. However, random linear Fountain 

Codes have encoding complexity of O (N 2 ) and 
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decoding complexity of O (N ) which makes 
them impractical for nowadays applications. 

The fountain code constructions we provide all 
have the property that encoded symbols are 
generated independently of one another. In 
addition, we will assume that the set of received 
encoded symbols is independent of the values of 
the encoded symbols in that set, an assumption 
that is often true in practice. These assumptions 
imply that for a given value of k, the probability 
of decoding failure is independent of the pattern 
of which encoded symbols are received and only 
depends on how many encoded symbols are 
received. 

3.4.1. Fountain Coding 

A fountain code is optimal if the original k source 
symbols can be recovered from any k encoding 
symbols. Fountain codes are known that have 
efficient encoding and decoding algorithms and 
that allow the recovery of the original k source 
symbols from any k' of the encoding symbols 
with high probability, where k' is just slightly 
larger than k. 

Digital fountains have changed the standard 
transmission paradigm. A digital fountain can 
encode and transmit an unlimited number of data 
packets until every user gets enough information 
to guarantee correct decoding. Multimedia 
broadcasting, emerging peer-to-peer applications 
are only two examples of many other scenarios 
where digital fountains can be successfully 
applied. 



1 1 1 1.. :. 




Figure (4): Fountain code 



As shown in Figure (4) [17]: Consider a file that 
can be split into k packets or information symbols 
and must be encoded for a BEC. Regardless of the 
erasure probability, Fountain Codes are near 
optimal for all BEC. Therefore, on the BEC, 
Fountain Codes are called universal codes, a 
message consists of k*k bits and each drop 
contains ^ bits. 

Whoever collects any K > K number of k bits, 
where K ? is slightly larger than K, can recover the 
original message with high probability. 
Fountain Codes can be implemented at random 
with an average degree of k I 2. Here, the degree 
is the number of ones divided by the total number 
of bits in the generator matrix. The average degree 
of the generator matrix determines the complexity 
of encoding and decoding process. The higher the 
degree, the higher the complexity at the 
transmitter and receiver side and the more 
successful the receiver is in the decoding phase. 
Let us assume that we transmit k source symbols k 
s, s, s... s 1 2 3 with a random generator matrix of 
degree k I 2. The encoding process of Fountain 
Codes is given by the following equation [9]: 

tn = TJk=l s k^kn (5) 
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Where t n indicates the transmitted symbols. 

G/ cn Can be generated at the transmitter side 
pseudo-randomly with a random seed, namely by 
a key, and transmitted to the receiver causing an 
extra overhead cost. As long as the symbol size is 
much larger than the key size, this overhead is 
neglectable. One other way to produce a unique 

Q 

n *is to synchronize the receiver and the 
transmitter with same clock pulses and to use 
deterministic random number generators at both 
sides. 

3.4.2. Fountain decoding 

The decoding process of Fountain Codes is given 
as: 

s k = Ym=l ^rfink -( 6 ) 

In order for a k*k G matrix to be invertible, each 
row should be linearly independent from the 
others. The probability that the first row is not an 
n — k 

all zero row is l-Z , the probability that the 
second row is neither all zero nor same with the 

first row is 1 — 2~ k+l . Iterating until K, we 
get as the overall success rate: 

i-s=nf=o(i - 2- k+i ) (7) 

1-5 is lower bounded by 0.289 for k >10. For any 
NxK binary matrix to be invertible, 5 is upper 

bounded by 2 ^ N K ^ . Accordingly, each 
additional row increases the success probability 
drastically. Thus, as the message size increases, 
random Fountain Codes come arbitrarily close to 
the channel capacity. Despite a very small 
overhead and rate erasure independency, random 
Fountain Codes have a quadratic encoding 
complexity, k bits times the degree k/2, and cubic 

decoding complexity ~2K 3 /3 . This makes 
them far away from most of the applications such 



as mobile broadcasting, where only a limited 
processor power can be used at the receiver side. 
Since then, many digital Fountain coding methods 
have been invented such as Luby transform codes 
(LT codes), Tornado codes and Raptor codes. 

4. Fountain coding Methods 
4.1. LT code 
4.1.1. Introduction 

Luby proposed LT codes which is the first 
implementation of digital fountain codes in 2002. 
Luby Transform (LT) codes have been proposed 
by Michael Luby to reduce the encoding and 
decoding complexity of random linear Fountain 
Codes while maintaining the small overhead. 
With a good choice of degree distribution, i.e. the 
distributions of the edges in the Tanner graph, LT 
codes can come arbitrarily close to channel 
capacity with certain decoder reliability and 
logarithmically increasing encoding and decoding 
costs. 

With LT codes, data was divided into fix size 
blocks. Each block is divided into fix size 
symbols. So the number of input symbols is fixed. 
Infinite coded symbols can be generated by 
encoder of LT codes. All input symbols can be 
recovered by decoder in LT codes when number 
of encoding symbols are received slightly larger 
than number of input symbols. 

In order to reduce the complexity even more, we 
can decrease the reliability of the decoder. Thus, 
we would have a reduced degree distribution 
resulting linear time encoding and decoding 
complexity. However, the decoder cannot decode 
all the input symbols with the lower degree 
distribution for the same overhead constraint. 
Therefore, utilizing an erasure correcting pre-code 
would then correct the erasures arising from the 
weakened decoder. 
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4.1.2. The construction 

LT codes are the first practical rateless codes for 
the binary erasure channel. The encoder can 
generate as many encoding symbols as required to 
decode k information symbols. The encoding and 
decoding algorithms of LT codes are simple; they 
are similar to parity-check processes. LT codes 
are efficient in the sense that the transmitter does 
not require an acknowledgement (ACK) from the 
receiver. This property is especially desired in 
multicast channels because it will significantly 
decrease the overhead incurred by processing the 
ACKs from multiple receivers. 

LT codes are known to be efficient; k 
information symbols can be recovered from any 

k+ O (V^/n 2 (k/S)) encoding symbols with 

probability 1-5 using O (k.ln(k/5)) operations. 
However, their bit error rates cannot be decreased 
below some lower bound, meaning they suffer an 
error floor. 

In order to reduce the computational 
complexity, the number of edges at the encoder 
side should be reduced. LT codes can then be 
thought as sparse random linear Fountain Codes 
with simple encoding and decoding algorithms. 
Although, there are simple and fixed encoding 
and decoding schemes defined for LT codes the 
degree distributions of the edges play a crucial 
role in the design of good codes. Good codes are 
such codes, which have low encoding and 
decoding costs as well as a small overhead and a 
decoding failure. Let us start with the definitions 
of encoding and decoding schemes. 

A. LT Coding 

Any number of encoding symbols can be 
independently generated from k information 
symbols by the following encoding process: 
1) Determine the degree d of an encoding 
symbol. The degree is chosen at random from 
a given node degree distribution P(x). 



2) Choose d distinct information symbols 
uniformly at random. They will be neighbors 
of the encoding symbol. 

3) Assign the XOR of the chosen d 
information symbols to the encoding symbol. 
This process is similar to generating parity 
bits except that only the parity bits are 
transmitted. 

The degree distribution P(x) comes from the 
sense that we can draw a bipartite graph, 
suchAs in Figure(5), which consists of 
information symbols as variable nodes and 
encoding symbols as factor nodes. The degree 
distribution determines the performance of LT 
codes, such as the number of encoding 
symbols and probability of successful 
decoding. The degree distribution is analyzed. 

11110 0 1 




1 0 D 1 

Figure (5): Generation of encoding symbols 

The encoding symbols are transmitted through 
a BEC with the probability of erasure p. The 
special characteristic of a BEC is that receivers 
have correct data or no data. There is no 
confusion where the decoder needs to "guess" 
the original data; it recovers the true data or 
gives up. 

B. LT Decoding 

For decoding of LT codes, a decoder needs to 
know the neighbors of each encoding symbol. 
This information can be transferred in several 
ways. For example, a transmitter can send a 
packet, which consists of an encoding symbol and 
the list of its neighbors. An alternative method is 
that the encoder and the decoder share a random 
number generator seed, and the decoder finds out 
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the neighbors of each encoding symbol by 
generating random linear combinations 
synchronized with the encoder. 

4.1.3. Degree Distribution 

LT codes do not have a fixed rate and hence the 
desired property is that the probability of success 
recovery is as high as possible while the number 
of encoding symbols required is kept small. 
Describing the property in terminology of the LT 
process, 

• the release rate of encoding symbols is 
low in order to keep the size of the ripple 
small and prevent waste of encoding 
symbols; 

• The release rate of encoding symbols is 
high enough to keep the ripple from dying 
out. 

Therefore, the degree distribution of encoding 
symbols needs to be elaborately designed so as to 
balance between the trade-off. This is the reason 
that the degree distribution plays an important role 
in LT codes. We investigate several probability 
distributions as the degree distribution used in the 
Frame Fountain encoding process [10]. They are 
presented as follows: 

1) Uniform distribution: Pi=i/n^i 
=1,2,3, ,n 

2) Normal distribution: (i=[ ft/2], 
8=k/2 



Pi-Mr 



Pi = 



V2tt5 2 



252 V 4 =l, ,fc; 



Where 

%i= [randn*5 + /j] 

3) sequential distribution: 



n-[n/k] 



p ± =l/n 

Pi = — l)Fori =2, 3, — -, n 

5) Robust Soliton distribution first define 
R=c In V^Iwhere c and 8 are extra 

parameters; c >0 is some suitable 
constant. 




. for i — 



4) ideal solution distribution 



0, otherwise. 



enough frames can be encoded together as 
redundant frames to make sure there are enough 
diversity of encoded frames at the receiver. 

4.2. Tornado code 

4.2.1. Introduction 

We introduce Tornado codes, a new class 
of erasure codes. Tornado codes first appeared in 
a technical report in 1997. These randomized 
codes have linear-time encoding and decoding 
algorithms. They can be used to transmit over 
lossy channels at rates extremely close to 
capacity. The encoding and decoding algorithms 
for Tornado codes are both simple and faster by 
orders of magnitude than the best software 
implementations of standard erasure codes. We 
expect Tornado codes will be extremely useful 
for applications such as reliable distribution of 
bulk data, including software distribution, video 
distribution, news and financials distribution, 
popular web site access, database replication, and 
military communications. 

Despite the simplicity of Tornado codes, their 
design and analysis are mathematically 
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interesting. The design requires the careful 
choice of a random irregular bipartite graph, 
where the structure of the irregular graph is 
extremely important. We model the progress of 
the decoding algorithm by a simple AND-OR 
tree analysis which immediately gives rise to a 
polynomial in one variable with coefficients 
determined by the graph structure. Based on 
these polynomials, we design a graph structure 
that guarantees successful decoding with high 
probability. 

Tornado codes are erasure block codes based 
on irregular spare graph. Given an erasure 
channel with loss probability p, they can correct 
up to p. (1- e) errors. They can be encoded and 
decoded in time proportional to n. log (1/ e). As 
shown in Figure (6), there are eight input symbols 
named xl, x2... x8. With tornado codes, four 
encoding symbols named yl, y2, y3 and y4 is 
produced by eight input symbols. Tornado codes 
can tolerate that any one of yl, y2, y3 and y4 can 
be recovered by three others. However, the 
complexity of encoding and decoding algorithms 
for tornado codes is proportional to block-length. 
This makes tornado codes not be adequate for 
large data transfer systems. 




Figure (6): Tornado codes 



Tornado codes belong to a new class of error- 
correcting codes for erasure channels, based on 
the construction of randomly connected irregular 
bipartite graphs. Using a carefully chosen graph 
structure, tornado codes can achieve nearly 
optimal efficiency, and with linear complexity 
for en- and decoding and rather simple and fast 
algorithms they are predestined for the encoding 



of large amounts of data, and as an alternative to 
the classical ARQ concept as used on the 
Internet. 

Tornado codes are erasure block codes and hence 
not rateless based on irregular spare graphs [14]. 
Tornado codes are generated by cascading a 
sequence of irregular random bipartite graphs. 
These graphs are equivalent to generator matrices. 
The operation of one such graph is shown in 
Figure 7. The nodes on the left are known. The 
values of nodes on the right are computed by 
performing an XOR operation of the neighboring 
input nodes. Given an erasure channel with loss 
probability p, they can correct up to p (1- s) 
errors. They can be encoded and decoded in time 
proportional to n log(l/ s). Thus Tornado Codes 
has been primarily designed to speed up erasure 
codes over the internet. 




Me - *aee B its 

Figure (7): Irregular Bipartite Graph 
4.2.2. Construction 

We consider a system model in which a single 
transmitter performs bulk data transfer to a larger 
number of users on an erasure channel. Our 
objective is to achieve complete file transfer with 
the minimum number of encoding symbols and 
low decoding complexity. For k information 
symbols, RS codes can achieve this with k log k 
encoding and quadratic decoding times. The 
reason for the longer decoding time is that in RS 
codes, every redundant symbol depends on all 
information symbols. By contrast, every 
redundant symbol depends only on a small 
number of information symbols in Tornado codes. 
Thus they achieve linear encoding and decoding 



22 



http://sites.google.com/site/ijcsis/ 
ISSN 1947-5500 



(IJCSIS) International Journal of Computer Science and Information Security, 
Vol. 12, No. 10, October 2014 



complexity, with the cost that the user requires 
slightly more than k packets to successfully 
decode the transmitted symbols. The main 
contribution is the design and analysis of optimal 
degree distributions for the bipartite graph such 
that the receiver is able to recover all missing bits 
by a simple erasure decoding algorithm. The 
innovation of Tornado code has also inspired 
work on irregular LDPC codes. 

4.3. Raptor code 
4.3.1. Introduction 

Raptor Codes are extension of LT codes 
combined with a pre-coding scheme,which can 
produce a potentially infinite stream of symbols 
such that any subset of symbols of size k (1+e) is 
sufficient to recover the original k symbols, with 
high probability. Each output symbol is generated 
using O (log (l/e)) operations, and the original 
symbols are recovered from the collected ones 
with O (k log (1/e)) operations. The main idea of 
Raptor Codes is to relax the condition of 
recovering all input symbols and to require that 
only a constant fraction of input symbols be 
recoverable. Then the number of edges in the 
Tanner graph will exhibit only a constant degree, 
which will yield linear time encoding, and 
decoding costs. This is done by utilizing an 
erasure correcting pre-code working in linear 
time. The degree distribution, which is used for 
Raptor Codes, should be completely different 
from the one that of LT codes. Because, in the 
concept of Raptor Codes, we are forced to recover 
as many input symbols as possible for a given 
constant average degree rather than to recover all 
input symbols to be recovered and decoding to be 
successful. Raptor (rapid Tornado) codes were 
developed and patented in 2001 as a way to 
reduce decoding cost to O (1) by preprocessing 
the LT code with a standard erasure block code 
(such as a Tornado code). Degree distribution 
design and pre-coding is the heart of Raptor 
Codes. In order to understand Raptor Codes, one 



has to know the fountain approach to the coding 
theory 

4.3.2. The construction 

Digital Fountain, Inc. proposed Raptor codes in 
2006. It is a concatenation of a systematic pre- 
code with LT codes. As shown in Figure (8), in 
the pre-code, k native symbols are first mapped to 
(1+ s) k pre-coded symbols. Infinite coded 
symbols can be generated from pre-coded 
symbols by LT codes. In decoding process of 
Raptor codes, pre-coded symbols are recovered by 
LT codes firstly, and then input symbols are 
recovered by pre-coded symbols. Raptor code has 
been standardized in the 3 GPP (Third Generation 
Partnership Project). 



pft-code Lf cedes 




Fig (8): Raptor codes 

A Raptor code can achieve constant per-symbol 
encoding and decoding cost with overhead close 
to zero and a space proportional to k. This has 
been shown to be the closest code to the ideal 
universal digital fountain. A similar vein of work 
was proposed in under the name online codes. 
We have already seen two extreme cases of 
Raptor codes. When there is no pre-code, then we 
have the LT code, as an example of a pre-code 
only (PCO) Raptor code for the s are an extension 
of LT codes[ll], which can produce a potentially 
infinite stream of symbols such that any subset of 
symbols of size k(l+s ) is sufficient to recover the 
original k symbols, with high probability. Each 
output symbol is generated using O (log (1/s)) 
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operations, and the original symbols are recovered 
from the collected ones with O (k log (1/e)) 
operations. The main idea of Raptor Codes is to 
relax the condition of recovering all input symbols 
and to require that only a constant fraction of 
input symbols be recoverable. Then the number of 
edges in the Tanner graph will exhibit only a 
constant degree, which will yield linear time 
encoding, and decoding costs. This is done by 
utilizing an erasure correcting pre-code working 
in linear time. The degree distribution, which is 
used for Raptor Codes, should be completely 
different from the one that of LT codes. Because, 
in the concept of Raptor Codes, we are forced to 
recover as many input symbols as possible for a 
given constant average degree rather than to 
recover all input symbols while maintaining the 
small overhead. 

A. Raptor Coding 

Raptor coding starts with a suitable design of the 
pre-code. Shokrollahi uses LDPC codes as a pre- 
code with a constant rate of (1+8 12) (1+e) and BP 
algorithm can work in linear time and decode (8 / 
4) (1+8 ) fraction of erasures where 8 is a real 
positive number. Next, the intermediate symbols 
are encoded with LT coding using a suitable 
degree distribution. 

In our design, we also considered LDPC codes as 
a pre-code of the Raptor Code. The average 
degree distributions used for LT codes are around 

d av = 3. According to the ballsbins problem 

e 3 = 5%Of the input symbols are not 
recoverable on average. Therefore, the pre-code 
should have a capacity of correcting at least 5% of 
the erasures. 



B. Raptor Decoding 

Raptor decoding starts with the LT decoding 
process. In the example, LT decoding can recover 
all the intermediate symbols but the ones filled 



with black. Since the pre-code is systematic, the 
first three input symbols are immediately 
recovered. The fifth intermediate symbol is 
encoded by ex-oring the third and fourth input 
symbol. We can recover the fourth input symbol 
by adding the fifth intermediate symbol to the 
third input symbol. Hence, as it is seen that 
decoding process succeeds. Decoding is done the 
same way as described in this paper for LT 
decoding. LDPC decoding is performed using BP 
algorithm. 

The table below summarizes the characteristics of 
various codes that are designed for the digital 
fountain ideal: 





Tornado 


LT 


Raptoi 


Rateless 


\o 


Yes 


Yes 


0\ erlK\Kl 


C 


t • i 


t - u 


Encoding complexity per symbol 


0(e ln(l/c)) 


Olln(A-)) 


P 1 


Decoding complexity per symbol 


0(eln(l/e)) 


0(ln(*)) 


O(l) 


Space per symbol 


O(l) 


O(l) 


Oi 1 ), with a larger constant. 



Table 1 : Summary of fountain codes 



5. The Conclusion 

This is clearly that the Fountain codes are 
flexibly applicable at a fixed code rate cannot be 
determined a priori, and where efficient encoding 
and decoding of large amounts of data is required. 

1) Fountain codes are suitable for bulk data 
distribution over the internet. 

2) The decoding time is acceptable when data 
rates are of the order of Mbps. 

3) Possible when multiple clients over 
different quality of channels are being served 
simultaneously. 

4) A more advanced Raptor code with greater 
flexibility and improved reception overhead, 
called Raptor Q, has been introduced into the 
IETF. This code can be used with up to 
56,403 source symbols in a source block, and 
a total of up to 16,777,216 encoded symbols 
generated for a source block. This code is able 
to recover a source block from any set of 
encoded symbols equal to the number of 
source symbols in the source block with high 
probability and in rare cases from slightly 
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more than the number of source symbols in 
the source block. 
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Abstract — A sensor network has many sensor nodes with 
limited energy. One of the important issues in these networks 
is the increase of the life time of the network. This paper 
proposes a hybrid algorithm which, acts on the network and 
using genetic algorithm at first stage to choose the best sensors 
as a cluster head and using sleep/wake up mechanism for 
redundant sensors in the second stage. This algorithm will 
balance the energy consumption in the network and improve 
the network life time and coverage preservation. 

Keywords-component, Wireless Sensor Network, Genetic 
Algorithm, Clustering, Sleep/Wakeup Mechanism, lifetime. 

I. Introduction 

All In recent years, the advancement of communication 
technology and electronic industry has lead to the production 
of relatively cheap and small sensors that establish 
communication through a wireless network [1]. These 
networks, which are known as wireless sensor networks, have 
become useful devices for getting data from the neighboring 
environment and monitoring environmental events. The 
application of these networks at home and in the army is 
increasing [2]. Works in the area of cluster-based wireless 
sensor networks is quite extensive, with energy efficiency and 
power consumption being the main focus of the clustering 
algorithm presented so far. Similar much research has been 
done on sensor activation algorithms, which focus on selecting 
some of the active sensor nodes that are enough to satisfy the 
network coverage requirements while allowing the remainder 
of the sensor to conserve their energy by entering the sleep 
state. In this section we discuss the related work that has been 
done in both these areas. In designing wireless sensor 
networks, the basic problem is the limitation of energy 
resource of the sensors and coverage preservation over long 
period of time. Moreover, because of the high number of the 
sensors in the network and lack of access to them, replacement 
or the charge of the sensor batteries is impossible. As a result, 
there is undoubtedly a need to introduce methods to optimize 
energy consumption and increase network life time [1]. 
Previous research shows that higher levels of energy 
efficiency can be achieved through organizing network nodes 



in clusters. Higher energy efficiency leads to increased 
network life time. In most studies network life is defined as 
the time period before the death of the first or the last node of 
the network [3]. 

Many clustering algorithm have been proposed [4,5]. A 
Typical clustering algorithm called low-energy adaptive 
clustering hierarchy (LEACH). LEACH algorithm is one of 
the most famous algorithms in wireless sensor networks. It has 
two phases: steady state phase and setup phase. At setup phase, 
transmission of data takes place one-hop. In each cluster, one 
node is chosen as the cluster head. The data collected from the 
member nodes are first processed locally in cluster heads. 
Then, they are sent in the form of a packet to the base station. 
As energy consumption in cluster heads is more than in 
normal nodes, their energy is used up after a while. For this 
reason, in LEACH, a rotation algorithm is employed for 
choosing cluster heads. All of this method can measure the life 
time of wireless sensor networks, but are not able to quantify 
the coverage of wireless sensor networks. The coverage metric 
in wireless sensor networks has been the subjective of 
increasing attention in recent years. The PEAS algorithm is a 
localize coverage algorithm, which supposes nodes have the 
same communication range and sensing range in the 
environment of asynchronous network [8]. This algorithm 
holds that close nodes have similar sense coverage range. In 
this algorithm, nodes which distances to the active node is too 
short can change their situation to sleep state. All nodes are in 
sleep state at the beginning, then node cycle awake and send 
detection message. All the active nodes within the 
communication range of this node receive this detection 
message and then judge whether the distance from this 
detection node is smaller than a fixed threshold. If it is so they 
send response message or they don't send any message. 
Detection nodes which receive response message will still be 
sleep, or they will be activated. Activated nodes will keep its 
activate state till the energy is finished. Though this algorithm 
has a good fault-tolerant ability, it cannot ensure fully covered 
area and easily causes coverage holes. In [9] authors propose 
a scheduling algorithm that enable each node to inter the 
active or sleeping state based on the coverage information 
obtained from its neighbors, without compromising full 
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network coverage . In order to avoid blind pint problem that 
occurs when two neighbors nodes simultaneously decides to 
sleep and leaving the part of the area uncovered, the authors 
introduce a random back off time before the nodes make a 
decision about its state . In this research the blind problem is 
solved by introducing delays in node activation based on the 
current energy of the neighbored nodes. The problem of 
scheduling nodes to enter the sleep state in cluster-based 
sensor network was studied [10]. The authors proposed a 
linear distance based sleep scheduling algorithm, where the 
probability that a sensor enter the sleeping state is proportional 
to its distance from the cluster head. Whereas, this scheme 
causes to unequal power consumption of sensors' nodes. 
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result, a part of the network loses its connection with the rest 
of the network. 

We will introduce a clustering algorithm that has two phases: 
S Select cluster head with genetic algorithm 
S Using Sleep/Wakeup mechanism for redundant 
nodes. 

III. ROPOSED ALGORITHM 

The purpose of this proposed algorithm is to solve the 
problems existing in leach algorithm. The proposed algorithm 
has the following capabilities. 

^ Applying the genetic algorithm to select cluster heads 
^ Applying Sleep/Wakeup mechanism to determine 
redundant nodes 



II. LEACH PROTOCOL 

First, LEACH is an algorithm for clustering for wireless 
sensor networks. The basic features of this protocol are as 
follows: 

S The base station is away from the sensor nodes 
S The base station is fixed 

S All the sensor nodes have the same initial energy 

In LEACH algorithm [1], each node produces a random 
number between [0, 1] and If the random number is below the 
threshold of T (n) (Equation (1)), the node is chosen as the 
cluster head. Where r, p and G are the number of current round, 
the desired percentage of cluster head and the set of sensor 
nodes which not selected as a cluster head in the last 1/p 
rounds, respectively. 

The value of threshold for a sensor is formulated as Equation 
(1): 



T(n) = \ 



l-Px(rmod— ) 
P 

0 



(1) 



When a node is selected as a cluster head, it broadcast a 
message for its neighbors. And the nodes receiving this 
message decide about joining one of the cluster heads based 
on the respective signal strength. 

S Problems of LEACH 

Depending on the thresholds, the number of cluster heads in 
some rounds is likely to be considered zero. In both of them, 
the position of nodes is disregarded when choosing cluster 
heads (distance between selected cluster head). This causes the 
density of cluster head nodes to be high in some points and 
low in other points. The nodes located in points that have low 
density use much energy to send the data to the related cluster 
heads (due to the great distance between nodes and cluster 
heads). These are the one of the main problem in Leach. As a 



> First phase 

In the proposed algorithm, genetic algorithm is used for 
selecting the cluster heads. Several parameters in fitness 
function are used for selecting the cluster heads. The new 
method is based on the idea that instead of the primary 
clustering, the effective cluster heads are selected first, and 
then the nodes near these cluster heads are placed in the 
related cluster. After selecting the cluster heads, the command 
of membership is issued to each of the nodes. And based on its 
distance from the cluster heads, each node sends its request 
regarding its confirmation of membership to the nearest cluster 
head. The new method ensures that clusters have nodes with 
shorter distances in comparison with other clusters. Therefore, 
in contrast with other forms of clustering, we can save the 
energy required for sending the data to cluster heads. 
The main criteria for fitness function that applied in this 
proposed method are as follows: 

S Number of selected cluster heads 
The main purpose of this proposed method is to obtain an 
optimal number of clusters. Since each cluster head is the 
representative of a cluster, the choice of cluster head K is in 
fact the same as the choice of cluster K. The purpose of 
defining this parameter is to study and organize the steps of 
sending data. With the increase of the number of cluster heads, 
the number of steps required for sending the data to the sink 
through cluster heads increases. As a result, this reduces the 
amount of the consumed energy. 

S Distance between cluster heads 
The distance of cluster heads from each other is another 
influential parameter in sending the data to the sink in sensor 
networks. In an operational environment covered by sensor 
networks, the distance of cluster heads from each other shows 
the degree of scattering of the nodes. The distance between 
cluster heads should be such that the cluster heads are neither 
close nor far from each other. If cluster heads are near each 
other, clusters are made in a compact form and with a high 
concentration. While in other points, clustering of nodes is 
formed with a weak concentration. On the other hand, the 
distance between cluster heads should not exceed the 
allowable threshold because sending data through cluster 
heads faces problems, and the consumed energy of the cluster 
heads. 
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S Internal distance in clusters 
Internal distance of clusters shows the degree of closeness of 
nodes to each other. If the internal distance of a cluster is a 
small number, it shows that the nodes are concentrated. And 
this concentration guarantees data gathering. On the contrary, 
if the internal distance of the cluster is a great number, it 
shows the great distance between the nodes. In a wireless 
sensor network if all the clusters are optimal (i.e., all the 
internal distance of the clusters are optima), the whole network 
is in an optimal status. 



For getting a wireless sensor network in which the rate of 
consumed energy decreases, we can use another algorithm that 
selects clusters based on cluster heads. As it is clear from the 
title, we do clustering based on cluster heads rather than 
division of nodes. The proposed algorithm has two phases: 
S First phase: select cluster head based on genetic 
algorithm 

S Second phase: using sleep/wake up mechanism in 
each cluster for redundant nodes 

> Second phase 
This stage is design to solve the following problem: How can 
schedule nodes in the each cluster to sleep, so that the region 
can still have high coverage (coverage preservation) and 
maintain the longest possible life time. 
To reach this purpose we do following steps: 

S Each nodes send a message to its neighbor, that 
involve this information about sender node: id's 
node, position of node, remaining energy, angle of 
view 

S Each node according to those parameter, calculate the 
amount of overlaps with its neighbors 

^ If the angle of view's node is covered with its 
neighbors, we can considered it as a redundant node 

S After that redundant node was found we can turn it 
off. 

S According to the remaining energy's neighbors (find 
lowest remaining energy), we assign a threshold for 
redundant nodes which shows the time that we 
should turn redundant nodes on. 

Therefore, this algorithm can be summarized as follows: 
S Random distribution of the sensors 
S Cluster formation phase 

> Select cluster heads using genetic algorithm 

> Form the cluster 

S Determine redundant nodes in each cluster 

> Turn redundant nodes off 

> Turn redundant nodes on according to its 
specified thresholds. 
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In this section, the efficiency of the proposed algorithm and 
Leach are estimated by Matlab. The primary model and 
presupposition of the wireless sensor network presented in this 
proposed algorithm is as follows: 

S The network has N sensor nodes which are scattered 

randomly in a pre-determined environment. 
S The sensors and the sink have a fixed position and are 
not moveable. 

S All the sensors are identified based on the unique ID 
that they have. 

S The nodes are informed of each other's positions 

through the strength of the signals that they receive. 
S The presented algorithm runs in the sink and the 

result is announced to the nodes. 
S The algorithm can be accomplished periodically or if 
the nodes are eliminated, it can be_accomplished 
again 

In this simulation, the comparison criterion is the rate of 
consumed energy in sending two thousand random events in 
the sensor network. Due to the fact that the most waste of 
energy happens in sending messages, and the rate of energy 
consumption for receiving messages is equal in all sensors, in 
this paper we will focus on the rate of sending energy for a 
specified number of random events. The rate of consumed 
energy for sending K bits of data is obtained through Equation 
(3) as follows: 

(3) 



E t =E e+ E d *k 



K is the number of bits sent to the destination node. And E e 
is the energy consumed by the electric and electronic 
equipment in the sensor itself. E d Is the rate of energy 
changing based on the distance between source sensor and 
destination. E d Is obtained through Equation (4) as follows: 

11 d < d 

crossover 

d>d 

(4) 



E d = 



F *d z 

1 AMP u 

F * d L 

AMP u 



crossover 



In this equation, d crossover is considered as the threshold 
distance. 

The parameters used in the algorithm are shown in Table. 1 
Table 1 : Parameters of simulation 



IV. SIMULATION AND RESULT 
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Number of 
nodes 


100 


Area 


100*100 


Location of 
Base station 


(100,100) 


Initial Energy 


1 J 




50nj/bit 


s fi 


10pj/bit/m2 




0.0013pj/bit/m4 


^crossover 


87 



To study the performance of the new clustering algorithm, we 
also consider lifetime of the network as one of the main 
criteria for comparing methods. In the new proposed method, 
the lifetime of the method is studied as follows: 
At first, the network randomly faces different events in 
different points. Each sensor located near the event senses it. 
Then the sensor sends the received data to the nearest cluster 
head. Now depending on the number of sent data and also the 
distance between the sensor node and the cluster head, the 
energy of the sensor decreases. Due to the abundance of 
events, with the passage of time the energy of some of the 
sensors finishes. And from then on, those sensors will not be 
able to send data. This is called sensor death. Now we can 
save the number of the alive nodes in each of the rounds. 
Based on this criterion, In Fig.l the proposed algorithm and 
leach algorithm are compared. As it is observed, the proposed 
algorithm is more efficiency than leach algorithm. 
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In Fig. 2, the energy consumption related to the proposed 
algorithm and Leach is demonstrated. Fig. 2 shows the rate of 
energy consumption of the network until the first sensor is 
dead. The program's running will finish when the first sensor 
is out of energy. For example, As it can be seen, in Leach 
algorithm, while the energy of the whole network is 87J, in the 
65 th round the first node gets out of energy and gets out of the 
network. However, in the proposed algorithm, the first node 
gets out of the network in the 1 80 th round with an existence of 
approximately 83 J energy in the whole network. This matter 
shows that the proposed algorithm has a higher balance in the 
consumption of the energy of the sensors used in the network. 
And this is because of the balanced clustering in the proposed 
algorithm. 




Event Number 



Fig. 2 comparison proposed algorithm and LEACH until the first node is dead. 




In Fig. 3, the proposed algorithm and leach algorithm is 
compared base on the parameter of the rate of energy existing 
in the network. As shown in the figures, the proposed 
algorithm has more efficiency than leach. 



0 200 400 600 800 1000 1200 1400 

Event Number 



Fig. 1 comparison number of alive nodes between proposed algorithm and 
LEACH 
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of energy by the cluster heads. Cluster heads have a balanced 
distance from all the internal nodes, and they also have more 
remaining energy than other nodes. Therefore, they are 
optimal and suitable representatives for the clusters. In this 
paper we focused on the issue of clustering by means of 
genetic algorithm. In this regard, we studied and focused on 
the kinds of clustering each of which affect the lifetime of the 
network in some way. Based on the algorithm presented in this 
paper, we obtained optimal clustering algorithm that increase 
network lifetime and improved coverage preservation. 



Fig. 3 comparison total energy consumption between proposed algorithm and 

LEACH 

In the following tables, the proposed algorithm and Leach 
algorithm are compared regarding minim parameter of 
existing energy between the nodes of the network in 
subsequent rounds. For example, as it can be seen, for 
example, in Leach algorithm in the 20th round, the weakest 
sensor (i.e., the sensor with the least energy) has 0.39 J of 
energy. However, in the proposed algorithm, the weakest 
sensor has more remaining energy. 



Table 2. Comparison between proposed algorithm and LEACH in 
node with minimum energy in network 



svsrt 
Weakest sensor 


20 th 


80" 


100" 


Proposed 
algorithm 


0.76 J 


0.55 J 


0.41 J 


Leach 


0.39 J 


0.01 J 


0.0 J 



V. Conclusions 

The algorithm proposed in this paper is a smart and effective 
algorithm. The results obtained from simulation indicate that 
clusters have less internal distances. Therefore, our objective 
which was to gather data in the clusters was achieved. It needs 
to be mentioned that this algorithm is run in sink for the first 
time and then the output of the algorithm is announced to all 
the nodes. Configuration of the network is done in the BS for 
the first time. Then this algorithm can start reconfiguration of 
the network periodically or with the announcement of ending 



Text heads organize the topics on a relational, hierarchical 
basis. For example, the paper title is the primary text head 
because all subsequent material relates and elaborates on this 
one topic. If there are two or more sub-topics, the next level 
head (uppercase Roman numerals) should be used and, 
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Systems, Security for Critical Infrastructures, Security for P2P systems and Grid Systems, Security in E- 
Commerce, Security and Privacy in Wireless Networks, Secure Mobile Agents and Mobile Code, Security 
Protocols, Security Simulation and Tools, Security Theory and Tools, Standards and Assurance Methods, 
Trusted Computing, Viruses, Worms, and Other Malicious Code, World Wide Web Security, Novel and 
emerging secure architecture, Study of attack strategies, attack modeling, Case studies and analysis of 
actual attacks, Continuity of Operations during an attack, Key management, Trust management, Intrusion 
detection techniques, Intrusion response, alarm management, and correlation analysis, Study of tradeoffs 
between security and system performance, Intrusion tolerance systems, Secure protocols, Security in 
wireless networks (e.g. mesh networks, sensor networks, etc.), Cryptography and Secure Communications, 
Computer Forensics, Recovery and Healing, Security Visualization, Formal Methods in Security, Principles 
for Designing a Secure Computing System, Autonomic Security, Internet Security, Security in Health Care 
Systems, Security Solutions Using Reconfigurable Computing, Adaptive and Intelligent Defense Systems, 
Authentication and Access control, Denial of service attacks and countermeasures, Identity, Route and 



Location Anonymity schemes, Intrusion detection and prevention techniques, Cryptography, encryption 
algorithms and Key management schemes, Secure routing schemes, Secure neighbor discovery and 
localization, Trust establishment and maintenance, Confidentiality and data integrity, Security architectures, 
deployments and solutions, Emerging threats to cloud-based services, Security model for new services, 
Cloud-aware web service security, Information hiding in Cloud Computing, Securing distributed data 
storage in cloud, Security, privacy and trust in mobile computing systems and applications, Middleware 
security & Security features: middleware software is an asset on 

its own and has to be protected, interaction between security- specific and other middleware features, e.g., 
context-awareness, Middleware-level security monitoring and measurement: metrics and mechanisms 
for quantification and evaluation of security enforced by the middleware, Security co-design: trade-off and 
co-design between application-based and middleware-based security, Policy-based management: 
innovative support for policy-based definition and enforcement of security concerns, Identification and 
authentication mechanisms: Means to capture application specific constraints in defining and enforcing 
access control rules, Middleware-oriented security patterns: identification of patterns for sound, reusable 
security, Security in aspect-based middleware: mechanisms for isolating and enforcing security aspects, 
Security in agent-based platforms: protection for mobile code and platforms, Smart Devices: Biometrics, 
National ID cards, Embedded Systems Security and TPMs, RFID Systems Security, Smart Card Security, 
Pervasive Systems: Digital Rights Management (DRM) in pervasive environments, Intrusion Detection and 
Information Filtering, Localization Systems Security (Tracking of People and Goods), Mobile Commerce 
Security, Privacy Enhancing Technologies, Security Protocols (for Identification and Authentication, 
Confidentiality and Privacy, and Integrity), Ubiquitous Networks: Ad Hoc Networks Security, Delay- 
Tolerant Network Security, Domestic Network Security, Peer-to-Peer Networks Security, Security Issues 
in Mobile and Ubiquitous Networks, Security of GSM/GPRS/UMTS Systems, Sensor Networks Security, 
Vehicular Network Security, Wireless Communication Security: Bluetooth, NFC, WiFi, WiMAX, 
WiMedia, others 



This Track will emphasize the design, implementation, management and applications of computer 
communications, networks and services. Topics of mostly theoretical nature are also welcome, provided 
there is clear practical potential in applying the results of such work. 

Track B: Computer Science 

Broadband wireless technologies: LTE, WiMAX, WiRAN, HSDPA, HSUPA, Resource allocation and 
interference management, Quality of service and scheduling methods, Capacity planning and dimensioning, 
Cross-layer design and Physical layer based issue, Interworking architecture and interoperability, Relay 
assisted and cooperative communications, Location and provisioning and mobility management, Call 
admission and flow/congestion control, Performance optimization, Channel capacity modeling and analysis, 
Middleware Issues: Event-based, publish/subscribe, and message-oriented middleware, Reconfigurable, 
adaptable, and reflective middleware approaches, Middleware solutions for reliability, fault tolerance, and 
quality-of-service, Scalability of middleware, Context-aware middleware, Autonomic and self-managing 
middleware, Evaluation techniques for middleware solutions, Formal methods and tools for designing, 
verifying, and evaluating, middleware, Software engineering techniques for middleware, Service oriented 
middleware, Agent-based middleware, Security middleware, Network Applications: Network-based 
automation, Cloud applications, Ubiquitous and pervasive applications, Collaborative applications, RFID 
and sensor network applications, Mobile applications, Smart home applications, Infrastructure monitoring 
and control applications, Remote health monitoring, GPS and location-based applications, Networked 
vehicles applications, Alert applications, Embeded Computer System, Advanced Control Systems, and 
Intelligent Control : Advanced control and measurement, computer and microprocessor-based control, 
signal processing, estimation and identification techniques, application specific IC's, nonlinear and 
adaptive control, optimal and robot control, intelligent control, evolutionary computing, and intelligent 
systems, instrumentation subject to critical conditions, automotive, marine and aero-space control and all 
other control applications, Intelligent Control System, Wiring/Wireless Sensor, Signal Control System. 
Sensors, Actuators and Systems Integration : Intelligent sensors and actuators, multisensor fusion, sensor 
array and multi-channel processing, micro/nano technology, microsensors and microactuators, 
instrumentation electronics, MEMS and system integration, wireless sensor, Network Sensor, Hybrid 



Sensor, Distributed Sensor Networks. Signal and Image Processing : Digital signal processing theory, 
methods, DSP implementation, speech processing, image and multidimensional signal processing, Image 
analysis and processing, Image and Multimedia applications, Real-time multimedia signal processing, 
Computer vision, Emerging signal processing areas, Remote Sensing, Signal processing in education. 
Industrial Informatics: Industrial applications of neural networks, fuzzy algorithms, Neuro-Fuzzy 
application, biolnformatics, real-time computer control, real-time information systems, human-machine 
interfaces, CAD/CAM/CAT/CIM, virtual reality, industrial communications, flexible manufacturing 
systems, industrial automated process, Data Storage Management, Harddisk control, Supply Chain 
Management, Logistics applications, Power plant automation, Drives automation. Information Technology, 
Management of Information System : Management information systems, Information Management, 
Nursing information management, Information System, Information Technology and their application, Data 
retrieval, Data Base Management, Decision analysis methods, Information processing, Operations research, 
E-Business, E-Commerce, E-Government, Computer Business, Security and risk management, Medical 
imaging, Biotechnology, Bio-Medicine, Computer-based information systems in health care, Changing 
Access to Patient Information, Healthcare Management Information Technology. 
Communication/Computer Network, Transportation Application : On-board diagnostics, Active safety 
systems, Communication systems, Wireless technology, Communication application, Navigation and 
Guidance, Vision-based applications, Speech interface, Sensor fusion, Networking theory and technologies, 
Transportation information, Autonomous vehicle, Vehicle application of affective computing, Advance 
Computing technology and their application : Broadband and intelligent networks, Data Mining, Data 
fusion, Computational intelligence, Information and data security, Information indexing and retrieval, 
Information processing, Information systems and applications, Internet applications and performances, 
Knowledge based systems, Knowledge management, Software Engineering, Decision making, Mobile 
networks and services, Network management and services, Neural Network, Fuzzy logics, Neuro-Fuzzy, 
Expert approaches, Innovation Technology and Management : Innovation and product development, 
Emerging advances in business and its applications, Creativity in Internet management and retailing, B2B 
and B2C management, Electronic transceiver device for Retail Marketing Industries, Facilities planning 
and management, Innovative pervasive computing applications, Programming paradigms for pervasive 
systems, Software evolution and maintenance in pervasive systems, Middleware services and agent 
technologies, Adaptive, autonomic and context-aware computing, Mobile/Wireless computing systems and 
services in pervasive computing, Energy-efficient and green pervasive computing, Communication 
architectures for pervasive computing, Ad hoc networks for pervasive communications, Pervasive 
opportunistic communications and applications, Enabling technologies for pervasive systems (e.g., wireless 
BAN, PAN), Positioning and tracking technologies, Sensors and RFID in pervasive systems, Multimodal 
sensing and context for pervasive applications, Pervasive sensing, perception and semantic interpretation, 
Smart devices and intelligent environments, Trust, security and privacy issues in pervasive systems, User 
interfaces and interaction models, Virtual immersive communications, Wearable computers, Standards and 
interfaces for pervasive computing environments, Social and economic models for pervasive systems, 
Active and Programmable Networks, Ad Hoc & Sensor Network, Congestion and/or Flow Control, Content 
Distribution, Grid Networking, High-speed Network Architectures, Internet Services and Applications, 
Optical Networks, Mobile and Wireless Networks, Network Modeling and Simulation, Multicast, 
Multimedia Communications, Network Control and Management, Network Protocols, Network 
Performance, Network Measurement, Peer to Peer and Overlay Networks, Quality of Service and Quality 
of Experience, Ubiquitous Networks, Crosscutting Themes - Internet Technologies, Infrastructure, 
Services and Applications; Open Source Tools, Open Models and Architectures; Security, Privacy and 
Trust; Navigation Systems, Location Based Services; Social Networks and Online Communities; ICT 
Convergence, Digital Economy and Digital Divide, Neural Networks, Pattern Recognition, Computer 
Vision, Advanced Computing Architectures and New Programming Models, Visualization and Virtual 
Reality as Applied to Computational Science, Computer Architecture and Embedded Systems, Technology 
in Education, Theoretical Computer Science, Computing Ethics, Computing Practices & Applications 
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